# Nesterov-aided Stochastic Gradient Methods using Laplace Approximation for Bayesian Design Optimization

Carlon, Andre Gustavo, Ben Mansour Dia, Luis FR Espath, Rafael Holdorf Lopez, and Raul Tempone. "Nesterov-aided Stochastic Gradient Methods using Laplace Approximation for Bayesian Design Optimization." arXiv preprint arXiv:1807.00653 (2018).​
Carlon, Andre Gustavo, Ben Mansour Dia, Luis FR Espath, Rafael Holdorf Lopez, and Raul Tempone
Optimal Experimental Design, Bayesian Inference, Laplace Approximation, Stochastic Optimization, Accelerated Gradient Descent, Importance Sampling
2018
Finding the best setup for experiments is the main concern of Optimal Experimental Design (OED). We focus on the Bayesian problem of finding the setup that maximizes the Shannon expected information gain. We propose using the stochastic gradient descent and its accelerated counterpart, which employs Nesterov’s method, to solve the optimization problem in OED. In the stochastic gradient spirit, we couple a restart technique for the acceleration, as O’Donoghue and Candes [9] originally proposed for deterministic optimization. We couple these optimization methods with three estimators of the objective function: double-loop Monte Carlo (DLMC), Monte Carlo with the Laplace approximation of the posterior distribution (MCLA) and double-loop Monte Carlo with a Laplace-based Importance Sampling (DLMCIS). Using stochastic gradient methods and the Laplace-based estimators together allows us to afford expensive and complex models, such as those that require solving a partial differential equation (PDE). From a theoretical viewpoint, we derive an explicit formula that we use to compute the stochastic gradient of the Monte Carlo methods including either the Laplace approximation (MCLA) or Laplace-based importance sampling (DLMCIS). Finally, we study four examples from a computational standpoint: three based on analytical functions and one based on the finite element method solution to a PDE. The last example is an electrical impedance tomography experiment based on the complete electrode model. In these examples, the accelerated stochastic gradient for the MCLA approximation converges to local maxima in fewer model evaluations by up to five orders of magnitude than the gradient descent with DLMC.